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Abstract — We introduce a novel data-driven order reduction 
method for nonlinear control systems, drawing on recent progress 
in machine learning and statistical dimensionality reduction. The 
method rests on the assumption that the nonlinear system behaves 
linearly when lifted into a high (or infinite) dimensional feature 
space where balanced truncation may be carried out implicitly. 
This leads to a nonlinear reduction map which can be combined 
with a representation of the system belonging to a reproducing 
kernel Hilbert space to give a closed, reduced order dynamical 
system which captures the essential input-output characteristics 
of the original model. Empirical simulations illustrating the 
approach are also provided. 

I. Introduction 

Model reduction of controlled dynamical systems has been a 
long standing, and as yet, unsettled challenge in control theory. 
The benefits are clear: a low dimensional approximation of a 
high dimensional system can be manipulated with a simpler 
controller, and can be simulated at lower computational cost. 
A complex, high dimensional system may even be replaced 
by a simpler model all together leading to significant cost 
savings, as in circuit design, while the "important variables" of 
a system might shed light on underlying physical or biological 
processes. Reduction of linear dynamical systems has been 
treated with some success to date. As we describe in more 
detail below, model reduction in the linear case proceeds by 
reducing the dimension of the system with an eye towards 
preserving its essential input-output behavior, a notion directly 
related to "balancing" observability and controllability of the 
system. The nonlinear picture, however, is considerably more 
involved. 

In this paper we propose a scheme for balanced model- 
order reduction of general, nonlinear control systems. A key, 
and to our knowledge, novel point of departure from the 
literature on nonlinear model reduction is that our approach 
marries approximation and dimensionality reduction methods 
known to the machine learning and statistics communities with 
existing ideas in linear and nonlinear control. In particular, we 
apply a method similar to kernel PCA as well as function 
learning in Reproducing Kernel Hilbert Spaces (RKHS) to 
the problem of balanced model reduction. Working in RKHS 
provides a convenient, general functional-analytical framework 
for theoretical understanding as well as a ready source of 
existing results and error estimates. The approach presented 



here is also strongly empirical, in that observability and con- 
trollability, and in some cases the dynamics of the nonlinear 
system are estimated from simulated or measured trajectories. 
This emphasis on the empirical makes our approach broadly 
applicable, as the method can be applied without having to 
tailor anything to the particular form of the dynamics. 

The approach we propose begins by constructing empirical 
estimates of the observability and controllability Gramians in 
a high (or possibly infinite) dimensional feature space. The 
Gramians are simultaneously diagonalized in order to identify 
directions which, in the feature space, are both the most 
observable and the most controllable. The assumption that 
a nonlinear system behaves linearly when lifted to a feature 
space is far more reasonable than assuming linearity in the 
original space, and then carrying out the linear theory hoping 
for the best. Working in the high dimensional feature space 
allows one to perform linear operations on a representation 
of the system's state and output which can capture strong 
nonlinearities. Therefore a system which is not model re- 
ducible using existing methods, may become reducible when 
mapped into such a nonlinear feature space. This situation 
closely parallels the problem of linear separability in data 
classification: A dataset which is not linearly separable might 
be easily separated when mapped into a nonlinear feature 
space. The decision boundary is linear in this feature space, 
but is nonlinear in the original data space. 

Nonlinear reduction of the state space already opens the 
door to the design of simpler controllers, but is only half of 
the picture. One would also like to be able to write a closed, re- 
duced dynamical system whose input-output behavior closely 
captures that of the original system. This problem is the focus 
of the second half of our paper, where we again exploit helpful 
properties of RKHS in order to provide such a closed system. 

The paper is organized as follows. In the next section 
we provide the relevant background for model reduction and 
balancing. We then adapt and extend balancing techniques 
described in the background to the current RKHS setting in 
SectionU!!] Section|lV]then proposes a method for determining 
a closed, reduced nonlinear control system in light of the 
reduction map described in Section|III] Finally, SectionlVlpro- 
vides experiments illustrating an application of the proposed 
methods to a specific nonlinear system. 



II. Background 

Several approaches have been proposed for the reduction 
of linear control systems in view of control, but few exist for 
finite or infinite-dimensional controlled nonlinear dynamical 
systems. For linear systems the pioneering "Input- Output 
balancing" approach proposed by B.C. Moore observes that 
the important states are the ones that are both easy to reach and 
that generate a lot of energy at the output. If a large amount 
of energy is required to reach a certain state but the same 
state yields a small output energy, the state is unimportant 
for the input-output behavior of the system. The goal is then 
to find the states that are both the most controllable and 
the most observable. One way to determine such states is 
to find a change of coordinates where the controllability and 
observability Gramians (which can be viewed as a measure 
of the controllability and the observability of the system) 
are equal and diagonal. States that are difficult to reach and 
that don't significantly affect the output are then ignored 
or truncated. A system expressed in the coordinates where 
each state is equally controllable and observable is called its 
balanced realization. 

A proposal for generalizing this approach to nonlinear 
control systems was advanced by J. Scherpen 11201 . where 
suitably defined controllability and observability energy func- 
tions reduce to Gramians in the linear case. In general, to 
find the balanced realization of a system one needs to solve 
a set of Hamilton-Jacobi and Lyapunov equations (as we will 
discuss below). Moore 1 15] proposed an alternative, data-based 
approach for balancing in the linear case. This method uses 
samples of the impulse response of a linear system to construct 
empirical controllability and observability Gramians which 
are then balanced and truncated using Principal Components 
Analysis (PCA, or POD). This data-driven strategy was then 
extended to nonlinear control systems with a stable linear 
approximation by Lall et al. ifTTl . by effectively applying 
Moore's method to a nonlinear system by way of the Galerkin 
projection. Despite the fact that the balancing theory under- 
pinning their approach assumes a linear system, Lall and 
colleagues were able to effectively reduce some nonlinear 
systems. 

Phillips ifTHIl et al. has also studied reduction of nonlinear 
circuit models in the case of linear but unbalanced coordinate 
transformations and found that approximation using a poly- 
nomial RKHS could afford computational advantages. Gray 
and Verriest mention in [6| that studying algebraically defined 
Gramian operators in RKHS may provide advantageous ap- 
proximation properties, though the idea is not further explored. 
Finally, Coifman et al. [3 | discuss reduction of an uncontrolled 
stochastic Langevin system. There, eigenfunctions of a combi- 
natorial Laplacian, built from samples of trajectories, provide 
a set of reduction coordinates but does not provide a reduced 
system. This method is related to kernel principal components 
(KPCA) using a Gaussian kernel, however reduction in this 
study is carried out on a simplified linear system outside the 
context of control. 



In the following section we review balancing of linear and 
nonlinear systems as introduced in [15] and |20|. 

A. Balancing of Linear Systems 
Consider a linear control system 

x = Fx + Gu, n . 
y = Hx, ' (i) 

where (F, G) is controllable, (F, H) is observable and F is 
Hurwitz. We define the controllability and the observability 
Gramians as, respectively, 

W c = e Ft GG T e pTt dt, 
W = f™e FTt H T He Ft dt. 

These two matrices can be viewed as a measure of the 
controllability and the observability of the system [15]. For 
instance, consider the past energy |20|, L c (xo), defined as the 
minimal energy required to reach xo from in infinite time 

i r° 

L c (x ) = inf - / \\u{t)\\ 2 dt, (2) 

u£L 2 ( — oo,0), Z J_ oc 
x(— oo)— 0,a;(0)— xq 

and the future energy [20|, L (xq), defined as the output 
energy generated by releasing the system from its initial state 
x(to) = xo, and zero input u(t) = for t > 0, i.e. 

1 f°° 

L o {x ) = -J \\y(t)\\ 2 dt, (3) 

for x(to) = xo and u(t) = 0, t > 0. In the linear case, it can be 
shown that L c (xo) — ^XqW^Xq, and L (xq) = ^XqWoXq. 
The columns of W c span the controllable subspace while the 
nullspace of W coincides with the unobservable subspace. As 
such, W c and W (or their estimates) are the key ingredients 
in many model reduction techniques. It is also well known 
that W c and W satisfy the Lyapunov equations IfTSI 

FW C + W C F T = -GG T , 
F T W Q + W F = -H T H. 

Several methods have been developed to solve these equations 
directly lfl2l. lfl3l. 

The idea behind balancing is to find a representation where 
the system's observable and controllable subspaces are aligned 
so that reduction, if possible, consists of eliminating uncontrol- 
lable states which are also the least observable. More formally, 
we would like to find a new coordinate system such that 

W c = W = E = diag{(Ti ) --- ,a n ], 

where <7i > a% > ■ ■ ■ > a n > 0. If (F, G) is controllable 
and (F, H) is observable, then there exists a transformation 
such that the state space expressed in the transformed co- 
ordinates (TFT^^GtHT- 1 ) is balanced and TW C T T = 
T~^W T~ l = S. Typically one looks for a gap in the singular 
values {o~i} for guidance as to where truncation should occur. 
If we see that there is a A: such that ^> ak+i, then the states 
most responsible for governing the input-output relationship of 
the system are (xi, • • • , x^) while (x^+i, ■ ■ ■ , x n ) are assumed 
to make negligible contributions. 



Although several methods exist for computing T lfl2ll . |fl3ll , 
the general idea is to compute the Cholesky decomposition 
of W so that W = ZZ T , and form the SVD UT, 2 U T of 
Z T W C Z. Then T is given by T — Y^U^Z- 1 . We also note 
that the problem of finding the coordinate change T can be 
seen as an optimization problem [1] of the form 

mm trace [TW C T* + T~*W T~ 1 ]. 

B. Balancing of Nonlinear Systems 

In the nonlinear case, the energy functions L c and L a in 
© and (01 are obtained by solving both a Lyapunov and a 
Hamilton-Jacobi equation. Here we follow the development 
of Scherpen EOl . Consider the nonlinear system 



f( x ) + YT=i9i{x)ui 
h(x), 



(4) 



with x £ R", u £ R m , y £ R p , /(0) = 0, &(()) = for 
1 < i < m, and h(0) = 0. Moreover, assume the following 
Hypothesis. 

Hypothesis H: The linearization of (0]i around the origin is 
controllable, observable and F = §^| x =o is asymptotically 
stable. 

Theorem 2.1: [20 1 If the origin is an asymptotically stable 
equilibrium of f(x) on a neighborhood W of the origin, then 
for all x £ W, L (x) is the unique smooth solution of 



1 



{x)f(x) + -h , (x)h(x) =0, L o (0) = (5) 



under the assumption that (f5]) has a smooth solution on W. 
Furthermore for all x £ W, L c (x) is the unique smooth 
solution of 

^(x)f(x)+^(x)g(x) 9 T (x)^(x) = 0, L c (0) = 

(6) 

under the assumption that © has a smooth solution L c on W 
and that the origin is an asymptotically stable equilibrium of 

-{f{x) + g( x )g T (x)^(x)) onW. ' 

With the controllability and the observability functions on 
hand, the input-normal/output-diagonal realization of sys- 
tem (0]i can be computed by way of a coordinate transfor- 
mation. More precisely, 

Theorem 2.2: ll20l Consider system under Hypothesis 
H and the assumptions in Theorem I2.ll Then, there exists a 
neighborhood W of the origin and coordinate transformation 
x = (p(z) on W converting the energy functions into the form 

L c ((p(z)) = ^z T z, 

1 - 

L (ip{z)) = -^z^criiz,) 2 , 

i=l 

where a\{x) > <?2{x)> ■ ■ ■ > cr n (x). The functions Oj(-) are 
called Hankel singular value functions. 
Analogous to the linear case, the system's states can be sorted 
in order of importance by sorting the singular value functions, 
and reduction proceeds by removing the least important states. 



In the above framework for balancing of nonlinear systems, 
one needs to solve (or numerically evaluate) the PDEs 10, (0 
and compute the coordinate change x — ip(z), however there 
are no systematic methods or tools for solving these prob- 
lems. Various approximate solutions based on Taylor series 
expansions have been proposed (9), (8), [2]. Newman ifT&j 
introduces a statistical approximation based on exciting the 
system with white Gaussian noise and then computing the 
balancing transformation using an algorithm from differential 
topology. As mentioned earlier, an essentially linear empirical 
approach was proposed in ifTTI . In this paper, we combine as- 
pects of both data-driven approaches and analytic approaches 
by carrying out balancing in a suitable RKHS. 

III. Empirical Balancing of Nonlinear Systems in 

RKHS 

We consider a general nonlinear system of the form 

x = f(x,u) 



y = h(x) (n 

with x S 1™, u e R m , y € R p , /(0, 0) = 0, and h(0) = 0. 
Let Tl(x ) = {x 1 e R" : 3u £ L oc) (R,R m ) and 3T e 
[0, oo) such that x(0) — xo and x(T) = x'} be 
the reachable set from the initial condition a;(0) = x . We 
assume that the system is zero-state observable, and that the 
linearization of (Q around the origin is controllable. We also 
assume that the origin of x — f(x, 0) is asymptotically stable. 

We treat the problem of estimating the observability and 
controllability Gramians as one of estimating an integral 
operator from data in a reproducing kernel Hilbert space 
(RKHS) yj. Our approach hinges on the key modeling as- 
sumption that the nonlinear dynamical system is linear in 
an appropriate high (or possibly infinite) dimensional lifted 
feature space. Covariance operators in this feature space and 
their empirical estimates are the objects of primary importance 
and contain the information needed to perform model reduc- 
tion. In particular, the (linear) observability and controllability 
Gramians are estimated and diagonalized in the feature space, 
but capture nonlinearities in the original state space. The 
reduction approach we propose adapts ideas from kernel PCA 
(KPCA) iBTl and is driven by a set of simulated or sampled 
system trajectories, extending and generalizing the work of 
Moore [Q3] and Lall et al. JTT|. 

A. Definitions 

In the development below we lift state vectors of the system 
into a reproducing kernel Hilbert space [2 |, H, endowed with 
a symmetric positive definite kernel function K : X x X — > R 
which we assume here to be continuous and bounded by 
k = sup xeX y/ K(x, x) < 00. In particular, we make use of 
the following important properties: For all / £ H, f(x) = 
(fjKx}^,, where K x :— K(x,-). This is the reproducing 
property. Second, to any RKHS we can associate a feature 
map $ : X — > T satisfying <&(x')) H = K(x,x'). For 

example, we can take := K x in which case T = T~L - 

the "feature space" is the RKHS. We will further assume that 
% is always separable. 



B. Empirical Gramians 

Following [15 1, we estimate the controllability Gramian by 
exciting each coordinate of the input with impulses while 
setting xq = 0. One can also further excite using rotations 
of impulses as suggested in ifTTl . however for simplicity 
we consider only the original signals proposed in ITSl . Let 
u l (t) — 6(t)ei be the i-th excitation signal, and let be 
the corresponding response of the system. Form the matrix 
X(t) = [x l (t) ■ ■ ■ x m (t)] G M" xm , so that X{t) is seen as a 
data matrix with column observations given by the respective 
responses x l (t). Then W c G R" xn is given by 

1 f°° 

W c =- X{t)X(t) T . 
m Jo 

We can approximate this integral by sampling the matrix 
function X{t) within a finite time interval [0, T] assuming 
the regular partition {ti}fL 1 ,ti — (T/N)i. This leads to the 
empirical controllability Gramian 

rp n 

^- = ^£*(W*<) T - 

i—l 

As described in ifTSI . the observability Gramian is estimated 
by fixing u(t) — 0, setting x a = for i = 1, ...,n, and 
measuring the corresponding system output responses y l (i). 
As before, assemble the responses into a matrix Y(t) = 
[y x (t) • • • y n (t)] G W xn . The observability Gramian W a G 
l" x ™ and its empirical counterpart W are given by 

1 /*°° T SL 

W = - Y(t) T Y(t) , W = —J2 Y(ti)Y(ti) T 

where Y(t) = Y(t) T . The matrix Y{t t ) G R nxp can be 
thought of as a data matrix with column observations 

d j (t i ) = (i#(ti),...,i/"(*<)) T eR n , j = 1,...,p, (8) 

so that dj(ti) corresponds to the response at time tj of the 
single output coordinate j to each of the (separate) initial 
conditions xq = e&, k = 1, . . . , n. This convention will lead 
to greater clarity in the steps that follows. 

C. Kernel PCA 

Kernel PCA ED generalizes linear PCA by carrying out 
PCA in a high dimensional feature space defined by a feature 
map $ : R" — » T. Taking the feature map = K x and 

given the set of data x := {xi]f =l G 1™, we can consider PCA 
in the feature space by simply working with the covariance of 
the mapped vectors, 

1 N 

i=l 

where &(xi) g) $(xj) = (^(xi), ■) $>(xi) denotes the tensor 
product between two vectors in H. We will assume the data 
are centered in the feature space so that £\ &(xi) = 0. If not, 
data may be centered according to the prescription in l2p . 
The principal subspaces are computed by diagonalizing C x , 



however as is shown in ETI . one can equivalently form the 
matrix K G M. NxN of kernel products (if)-y = K(xi,Xj) for 
i, j = 1, . . . , N, and solve the eigenproblem Ka = NXa. 
If C x Vi — XiVi, then we have that Vi = ^a.i where ^> := 
($(xi) • ■ ■ $(a;jv)), and the non-zero eigenvalues of K and 
C x coincide. The eigenvectors a, of K are then normalized 
so that the eigenvectors Vi of C x have unit norm in the feature 
space, leading to the condition ||c*i|| 2 = A^ 1 . Assuming this 
normalization convention, sort the eigenvectors according to 
the magnitudes of the corresponding eigenvalues in descending 
order, and form the matrix A q = \oci ■ ■ ■ aj , 1 < q < 
min(n, N). Similarly, form the matrix V q = [vi ■ • • v q ~\ , 1 < 
q < n of sorted eigenvectors of C x . The first q principal 
components of a vector x = $(£) in the feature space are 
then given by V q x. It can be shown however (see [21]) that 
principal components in the feature space can be computed 
in the original space with kernels using the map II(a;) := 
Ajh(x), where k(a;) = (K(x, X\), . . . , K(x, xjv)) ■ 

D. Model Order Reduction Map 

The method we propose consists, in essence, of collecting 
samples and then performing a process similar to "simulta- 
neous principal components analysis" on the controllability 
and observability Gramian estimates in the (same) RKHS. As 
mentioned above, given a choice of the kernel K defining 
a RKHS H, principal components in the feature space can 
be computed implicitly in the original input space using K. 
Because we will find non- orthogonal coordinates in the feature 
space in which the Gramians become simultaneously diagonal, 
the process is not strictly speaking PCA, and the favorable 
properties associated with an orthonormal basis are no longer 
available. We will, however, continue to refer to the process 
of diagonalizing a covariance matrix as (K)PCA. 

Turning to the controllability Gramian (the case of the 
observability Gramian is analogous), first note that W c can 
be viewed as the sample covariance of a collection of N ■ m 
vectors, scaled by T: 

rp N rp N m 

i—l i—l j—1 

Thus we can form the controllability kernel matrix K c G 
R NmxNm of ker nel products (K c ) tj = K{xi,Xj) for i,j = 
1, . . . ,Nm in order to carry out PCA in the feature space, 
where we have re-indexed the set of vectors {x k (tt)} to 
use a single linear index. Similarly, we can compute the 
observability kernel matrix K G M. NpxNp consisting of the 
pairwise kernel products of the collection of data vectors 
described in (d). Ordinarily, Nm,Np 3> n and K C ,K will 
be rank deficient. 

We assume here for simplicity that the number of input 
excitation signals m is equal to the dimension of the output p 
so that the number of samples N taken from the output and 
state trajectories can be the same, leading to kernel matrices 
K c and K Q of the same size. If one adopts the set of input 
excitations {u l (t)} as above, then an alternative although more 



restrictive assumption can be that the number of inputs to the 
system is equal to the number of outputs. Then the pair K c , K a 
is simultaneously diagonalized by taking the (reduced) SVD of 
Kc /2 K Kc /2 so that K l J 2 K K l J 2 = UY?U T . Conjugation 
by T = E 1//2 £/ T \/ Kc diagonalizes K c and conjugation by 
T~ T = VtJu t Kc^ 2 diagonalizes K , where denotes 
the pseudoinverse of X. Finally, the order of the model 
is reduced by discarding small eigenvalues {Y,a}™ =q+1 , and 
projecting onto the subspace associated with the first q < n 
largest eigenvalues. This leads to the state-space reduction map 
II : M" ->• W given by 

IL(x) = Tjk c (x), xel" (10) 

where 

kcfr) := (K(x,x 1 (h)) i ...,K(x,x m {t N ))) T . (11) 
IV. Closed Dynamics of the Reduced System 

Given the nonlinear state space reduction map II : M" — > 
W, a remaining challenge is to construct a corresponding 
(reduced) dynamical system on the reduced state space which 
well approximates the input-output behavior of the original 
system on the original state space. Setting x r — H(x) and 
applying the chain rule, 

±r = {Jn(x)f(x, u)) | x=n -i (xr) • (12) 

However we are faced with the difficulty that the map II is not 
in general injective (even if q — n), and moreover one cannot 
guarantee that an arbitrary point in the RKHS has a non- 
empty preimage under $ [14|. We propose an approximation 
scheme to get around this difficulty: The dynamics / will 
be approximated by an element of an RKHS defined on the 
reduced state space. When / is assumed to be known explicitly 
it can be approximated to a high degree of accuracy. An 
approximate, least-squares notion of 'TI -1 " will be given to 
first or second order via a Taylor series expansion, but only 
where it is strictly needed - and at the last possible moment 
- so that a first or second order approximation will not be 
as crude as one might suppose. We will also consider, as 
an alternative, a direct approximation of Jn(n~ 1 (a; r )) which 
takes into account further properties of the reproducing kernel 
as well as the fact that the Jacobian is to be evaluated at 
x = n _1 (x r ) in particular. In both cases, the important ability 
of the map n to capture strong nonlinearities will not be 
significantly diminished. 

A. Representation of the dynamics in RKHS 

The vector-valued map / : I" x l m 4 I" can be 
approximated by a composing a set of n regression functions 
(one for each coordinate) f t : W xm -> M in an RKHS, 
with the reduction map n. It is reasonable to expect that 
this approximation will be better than directly computing 
/(n _1 (a; r ), u) using, for instance, a Taylor expansion notion 
of "LI -1 ", which may ignore important nonlinearities at a stage 
where crude approximations must be avoided. 



Let x = H(x) denote a reduced state variable, and con- 
catenate the input examples Xj = II (xj) € M. q ,Uj £ W n so 
that Zj — (xj,Uj) £ R <?xm , and {(fi(xj, Uj), Zj)}j—i is a set 
of input-output training pairs describing the i-th coordinate 
of the map (x,u) H > f(x,u). The training examples should 
characterize "typical" behaviors of the system, and can even 
re-use those trajectories simulated in response to impulses for 
estimating the Gramians above. We will seek the function 
fiEH which minimizes 

e 

3=1 

where A; here is a regularization parameter. We have chosen 
the square loss, however other suitable loss functions may be 
used. It can be shown l22l that in this case /j takes the form 

fi( z ) = Hj=i c )KH z i z j)i * = where defines 

the RKHS Hf (and is unrelated to K used to estimate the 
Gramians). Note that although our notation takes the RKHS for 
each coordinate function to be the same, in general this need 
not be true: different kernels may be chosen for each function. 
Here the {c*} comprise a set of coefficients learned using the 
regularized least squares (RLS) algorithm. The kernel family 
and any hyper-parameters can be chosen by cross-validation. 
For notational convenience we will further define the vector- 
valued empirical feature map 

(y(x,u)) z :=Kf((x,u),Zi) 

for i = l f ...,£. In this notation /j(LT(a;),ti) = cjh^(x,u) 
where (cj)j = c*. 

A broad class of systems seen in the literature l20ll are 
also characterized by separable dynamics of the form x = 
f( x ) + Eti 9i{ x ) u i- I n this case one need only estimate 
the functions / and from examples {(H(xj), f(xj))}j and 
{(U(x 3 ) ig ( X] ))} 3 . ' 

B. Approximation of the Jacobian Contribution 

We turn to approximating the component Jn(n _1 (x r )) 
appearing in Equation ( TTSi i. 

1) Inverse-Taylor Expansion: A simple solution is to com- 
pute a low-order Taylor expansion of n and then invert it 
using the Moore-Penrose pseudoinverse to obtain the ap- 
proximation. For example, consider the first order expansion 
II(x) ~ n(a) + Jn(a)(x — a). Then we can approximate 
n~ 1 (a; r ) (in the first-order, least-norm sense) as 

n _1 (x r ) := (Ma))\x r - n(o)) + a. (13) 

We may start with a = Xq, but periodically update the 
expansion in different regions of the dynamics if desired. A 
good expansion point could be the estimated preimage of x r (t) 
returned by the algorithm proposed in |10|. 

2) Exploiting Kernel Properties: For certain choices of the 
kernel K defining the Gramian feature space H, one can ex- 
ploit the fact that K x and its derivative bear a special relation- 
ship, and potentially improve the estimate for Jn(n _1 (x r .)). 
Perhaps the most commonly used off-the-shelf kernel families 



are the polynomial and Gaussian families. For any two kernels 
with hyperparameters p and q (respectively) in one of these 
classes, we have that K p = (K q y/i. We'll consider the 
polynomial kernel of degree d, K d (x,y) := (1 + (x,y)) d 
in particular; the Gaussian case can be derived using similar 
reasoning. For a polynomial kernel we have that 

dK ff ] = d^-i(^)y T = d(K d (x,y))^ L y T . 

Recalling that K d (x,y) = (&(x), <&(y)) H and x r = H(x) = 
V q &(x), if II was invertible then we would have 



dK d (x, y) 



Ox 



d-l 



x=n- 1 (x r ) 



d(($on- 1 )( Ir ),$( !/ )) d % p 



The map II is not injective however, and in addition the fibers 
of $ may be potentially empty, so we must settle for an 
approximation. It is reasonable then to define ($ o II -1 )(av) 
as the solution to the convex optimization problem 



ll^ll- 



mm 



subj. to \\V q z 



= 0. 



(14) 



If a point z E H has a pre-image in R" this definition 
is consistent with composing $ with the formal definition 
= {x e M" I ${x) = z} and noting that in this 
case II o = V q T (3> o <i) _1 ) = V q z. Furthermore, a 

trajectory x r (t) of the closed dynamical system on the reduced 
statespace need not (and may not) have a counterpart in the 
original statespace by virtue of the way in which "II -1 " is used 
in our formulation of the reduction map and corresponding 
reduced dynamical system. 

One will recognize that the solution z* to ( TBI is just the 
Moore-Penrose pseudoinverse z* = (V q Jyx r . Inserting this 
solution into the feature map representation of a kernel K 
gives the following definition for K(Il^ 1 (x r ), y): 

K(n-\x r ), y) = (($ o n- 1 )(x r )^(y)) n 

= {(V q T )^x r ^(y)) n = {x r ,Vjt>(y)) Rk 

= (x r ,(v q T v q y 1 v q My)) 
= (x r) (v q T v q )- 1 u( y )) 

= (x r ,(TjK c T q )- 1 U(y)) 
= (x r ,(TjT q Z q )- 1 Il(y)). 

Substituting into the derivative for a polynomial kernel K = 

K d gives 



dK d (x,y) 



dx 



x=n- 1 (x r ) 



= d(x r ,(T q T T q Z q )- 1 Tl(y)) d ^ 



which immediately gives an expression for Jn(II~ 1 (x r )). 
Note that this approximation is global in the sense that the 
q x q matrix inverse (T g T T g E g ) _1 need only be computed 
once; no updating is required during simulation of the closed 
system. 



C. Reduced System Dynamics 

Given an estimate f(ll(x),uj of f(x,u) in the RKHS Hf 
and a notion of Jn(n _1 (a; r )) from above, we can write down 
a closed dynamical system on the reduced statespace. We have 

x r « (J n {x)f(tt(x),u)) 

~ ( J n( x ))\x=n-i(x r ) CTkf ( Xr > u ") 
RiTjj k (ir l (xr))C T kf(x r ,u) (15) 

where C is a matrix with the vectors Cj as its rows, and 
Jk is the Jacobian of the empirical feature map defined in 
Equation ( fTTT i. Here the expression Ji(;(ll _1 (x r )) should be 
interpreted as notation for either of the Jacobian approxima- 
tions suggested in Section HV-BI 

Equation ( fl3T > is seen to give a closed nonlinear control 
system expressed solely in terms of the reduced variable 
x r E R q : 

(x r = T q T j k (n-\x r ))c T y(x r ,u) 

{ y = h( x r) 

where the map h o II modeling the output function h : M" — > 
M. p is estimated as described immediately below. Although 
the "true" reduced system does not actually exist due to non- 
injectivity of the feature map in many situations one can 
expect that the above system will capture the essential input- 
output behavior of the original system. We leave a precise 
analysis of the error in the approximations appearing in (fT~5b 
to future work. 

D. Outputs of the Reduced System 

Analogous to the case of the dynamics /, we are faced with 
two possibilities for approximating y = /i(ll _1 (x r )) . We can 
apply the Taylor approximation II -1 , or as in Section HV-AI 
we can estimate a map (h o II) : M. n — > M. p , x r H> y 
from the reduced state space to the output space directly, 
using RKHS methods. Given samples {H(xj), yj}j—i, each 
coordinate function (hi) is given in the familiar form 
hi(TL(x)) = ^=i^^ n (a;),n(a; J )), where K h is the 
kernel chosen to define the RKHS, and may be different for 
each coordinate. It should be noted that just given the state 
space reduction map n, one can immediately compare the 
output of the system defined by h(x r ) to the original system 
without defining a closed dynamics as above. In fact with II 
and h one can design a simpler controller which takes as input 
the reduced state variable x r , but controls the original system. 

V. Experiments 

We demonstrate an application of our method to a 7- 
dimensional nonlinear system with one dimensional input and 
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Fig. 1. (Left) Simulated output trajectories for the original and reduced (2-dimensional) system. (Right) Top Hankel singular values (zeros 
omitted). 



output appearing in [17] (Example 3.2, pg. 54): 



-4 



X-3 = ~X 3 
± 5 = XlX 2 X 3 

y = xi - 



u 

x 5 



X2 



x\x 2 



Zx\x 



x 4 = —x i + X\ — x 2 + x 3 + 



u 

2u 



-2xi 



x 6 -- 
2x 5 



x 5 - 
- x 7 



2u 



X3 + x 4 x 3 + x 5 — 2x e + 2xr 



Impulse and initial-condition responses of the system were 
simulated as described above, and 800 samples equally spaced 
in the time interval [0, 5s] were sampled to build the kernel 
matrices K c and K a using the third degree polynomial kernel 
K(x,y) = (1 + (x,y)) 3 . Recall that these kernel matrices, 
are the inner product counterparts to the empirical Gramians. 
Examples of K c and K for this system are shown in Figure [2] 
We imposed a small amount of regularization when computing 
the balancing transformation T, taking the Cholesky decom- 
position of K c + 0.001 • I instead of K c . Figure [T] (right 
pane) shows the Hankel singular values £ = TK C T T for 
this problem on a log scale. One can see that perhaps the 
first two components ought to capture most of the system's 
behavior. Thus the reduction map II was defined by taking 
only the eigenvectors (scaled columns of T) corresponding to 
the largest two Hankel singular values, giving a reduced state 
space of dimension two. 

Next, a map from the reduced variable x r to x was estimated 
following Section IIV-AI The control input was chosen to be 
a lOhz square wave, and 1000 samples from the simulated 
system in the interval [0, 5s] were mapped down using n 
and then used to solve the (n) RLS regression problems, one 
for each state variable, again using a third degree polynomial 
kernel. All initial conditions were set to zero. The desired 
outputs (dependent variable examples) used to learn / were 
taken to be the true / evaluated at the samples from the 
simulated state trajectory. We also added a bias dimension of 
l's to the data to account for any offset, and used a fast leave- 
one-out cross-validation (LOOCV) computation [19| to select 
the optimal regularization parameter. Two remarks are in order. 
The above dynamics can in fact be represented explicitly and 



exactly in a 3rd degree polynomial RKHS; only monomials 
up to degree 3 appear in the dynamics. Second, the control 
input is decoupled from the state. Both of these facts can be 
used to obtain an improved reduced model, however we did 
not make use of these special properties and instead applied 
the simplest version of the techniques described above which 
assume no special structure. 

We followed a similar process to learn the output function 
y = h(x r ). Here we used a 10Hz square wave control input, 
zero initial conditions and 700 samples in the interval [0, 5s]. 
For this function the Gaussian kernel K(x, y) = exp(— -y || a; — 
j/ 1 1 2) was used to demonstrate that our method does not rely on 
any particular match between the form of the dynamics and the 
type of kernel. The scale hyperparameter 7 was chosen to be 
the average distance between the training examples. We again 
used LOOCV to select the RLS regularization parameter. 

Finally, the closed system was simulated as described above 
using xq = and a control input different from those 
used to learn the dynamics and output functions: u(t) = 
i (sin(27r3t)-|-sq(27r5t— 7r/2)j where sq(-) denotes the square 
wave function. This input is shown at the top of Figure Q] (left 
panel). The Taylor series approximation for n was done once 
about xq and was not updated further. The simulated outputs 
y(t) of the closed reduced system as well as the output y(t) 
of the original system are plotted at the bottom in Figure [T] 
(left panel). One can see that, even for a significantly different 
input, the two dimensional reduced system closely captures 
the original system. The main source of error is seen to be 
over- and under-shoot near the square wave transients. This 
error can be further reduced by simulating the system for 
different sorts of inputs (and/or frequencies) and including the 
collected samples in the training sets used to learn n, / and h. 
Indeed, we have had some success driving example systems 
with random uniform input in some cases. 

VI. Conclusion 

In this paper we introduced a new model reduction method 
for nonlinear control systems. The method assumes that the 
nonlinear system is approximately linear in a high dimensional 
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Fig. 2. Plots of the kernel matrices encoding information about the empirical controllability Gramian matrix (left) and observability Gramian 
matrix (right). 



feature space, and carries out linear balanced truncation in 
that space. This leads to a nonlinear reduction map, which 
we suggest can be combined with representations of the 
dynamics and output functions by elements of an RKHS to 
give a closed reduced order dynamical system which captures 
the input-output characteristics of the original system. We 
then demonstrated an application of our technique to a 7- 
dimensional system and simulated the original and reduced 
models for comparison, showing that the approach proposed 
here can yield good low-order nonlinear reductions of strongly 
nonlinear control systems. We believe that techniques well 
known to the machine learning and statistics communities 
can offer much to control and dynamical systems research, 
and many further directions remain, including reduction of 
unstable systems and structure preserving systems. 
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